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Abstract: Today, image denoising by thresholding of wavelet coefficients is a commonly used tool for 
2D image enhancement. Since the data product of spectroscopic imaging surveys has two spatial and 
one spectral dimension, the techniques for denoising have to be adapted to this change in dimensionality. 
In this paper we will review the basic method of denoising data by thresholding wavelet coefficients 
and implement a 2D- ID wavelet decomposition to obtain an efficient way of denoising spectroscopic 
data cubes. We conduct different simulations to evaluate the usefulness of the algorithm as part of a 
source finding pipeline. 

Keywords: methods: data analysis — techniques: image processing — techniques: spectroscopic 



Table 1: Number of abstracts on ADS containing 
the word "wavelet" for given date ranges. 



Years 


Number of Abstracts 


Until 1995 


251 


1996 - 2000 


679 


2001 - 2005 


1221 


Since 2006 


1797 



1 Introduction 

The usage of the wavelet transform in astrophysics 
has become very popular in recent years. Table [I] 
compiles the number of publications on AD^] in a 
given range of years that have the word "wavelet" con- 
tained in their abstract. Clearly, the usage of wavelets 
has gained popularity fast. Typical applications for 
wavelet-transform-based methods are morphological sep- 
aration of sources in images and noise removal. The 
success of wavelet based methods in astrophysics is 
in part due to the fact, that astrophysical data often 
contains information on different angular or spectral 
scales. For example, an optical image of a galaxy con- 
tains compact, bright stars as well as extended and 
faint emission from the bulge and spiral arms. Multi- 
scale methods, such as the wavelet transform, allow to 
investigate the differen t scales of an image separately 
(IStarck & Bobin|2010L 



The most widely used type of wavelet transforma- 
tion is the so called undecimated or redundant, isotropic 
wavelet transformation. This is in part due to the algo- 
rithmic simplicity of the method but also because un- 
decimated wavelet transforms have proven to be more 
efficient for noise removal then their decimated coun- 
terpart. Apart from that, they also provide a number 



of computational advantages when reconstruc ting an 
image from a subset of its wavelet coefficients ([Starck 
et al.||2010| ). 

In this paper we review the basics of denoising 
based on the undecimated wavelet transformations in 
Section[2]and present an extension of the wavelet trans- 
form to thre e dimensional data as proposed in [Starck 
et al.| ( |2009| . Section [3] describes the implementation 
of the transform in C++ along with a description of 
where the implementation departs from the original al- 
gorithm. A first application of the algorithm is shown 
in Section [4] where we use the algorithm to implement 
a source finder and test the performance on simulated 
H I galaxies. We close the paper with a summary and 
an outlook on future applications and potential im- 
provements to the algorithm. 



2 Wavelet denoising 

The isotropic, undecimated wavelet transform (IUWT) 
decomposes data D(x) into J+ 1 subbands 



D(x) = cj(x) + ^Wj{x) 



(1) 



where cj is a smooth version of the data and the details 
at position x and scale j are contained in the wavelet 



coefficients Wj 



The IUWT can be efficiently cal- 



culated by using the so c alled "algorithme a trous" 
fflolschneider et al.||1989|. To calculate the IUWT 



X NASA Astrophysics Data System, http: //adswww. 
harvard . edu/ 



one needs to convolve the input data with increasingly 
larger kernels. To calculate the next convolution, the 
algorithme a trous convolves the previously convolved 
data again with the same kernel with 2 J zeros inserted 
between the kernel values. For multidimensional trans- 
forms this insertion of zeros is done isotropically in all 
dimensions. This allows efficient calculation of even 
the largest scales. 
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At each step, two consecutive convolved versions 
of the data Cj(x) and Cj+i(x) are used to calculate the 
wavelet coefficients Wj(x) = Cj(x) — Cj+i(x). The num- 
ber of scales is usually chosen to be |_l°g2 ? where 
Ni is the number of samples per dimension in a data 
set, e.g. an image with N± x N2 pixels. A more d etailed 
description can be found in Starck et al. (2010). 



surveys with WSRT+Apertif (|Oosterloo et al. 



When using wavelets to denoise data, one assumes, 
that the signal in the data, e.g. the sources, can be de- 
scribed by only a few relevant coefficients in each of the 
detail subbands Wj(x), i.e. that the signal is sparse in 
a given wavelet representation. Consequently, one can 
try to detect only the relevant coefficients and recon- 
struct the image from those. 

The detection is usually based on estimating the 
standard deviation cjj of the coefficients in a given sub- 
band and only take the coefficients with absolute val- 
ues above a certain threshold t<jj to be significant, t 
is usually chosen to be between 3 and 5. Then, if one 
applies Equation [I] with all insignificant coefficients 
set to zero, one obtains a denoised approximation of 
the data. 

Since this nonl inear denoising benefi ts greatly if it- 
erated a few times, Murtagh et al. ( 1995 ) developed the 
notion of a multi-resolution support M, that contains 
information about whether the data has a significant 
coefficient at a given location and scale. The multi- 
resolution support is defined as follows: 



M(x,j) 



{i 



if Wj (x) is significant 
else 



(2) 



Using this multi-resolution support, one can imple- 
ment the following iterative reconstruction scheme: 

1. Detect all significant coefficients Wj (x) and store 
this information in the multi-resolution support 
M(xJ). 

2. Calculate the IUWT of the data D and recon- 
struct the image only from the coefficients that 
belong to M to obtain D. 

3. Calculate the residual R = D — D. 

4. Calculate the IUWT of R and again only retain 
the coefficients that belong to M . Add this re- 
construction to D. 

5. Go to step 3 until the desired number of itera- 
tions is reached. 

In practice a small number of iterations (< 10) is suf- 
ficient. Many examples of how iteration impro ves th e 
denoising process can be found in Starck et al. (2010). 



2.1 Extension to 2D- ID data 

The aforementioned decomposition and reconstruction 
works very well if the relevant signal in the data is 
isotropic or nearly isotropic. This is true for most 
ID and 2D astrophysical data like spectra and im- 
ages. In the case of imaging spectroscopic surv eys like 
the past "Hi Parkes All-Sky Surv ey" (HIPASS; |Barnes 
et al.|2001||Koribalski et al.|2004), ongoing "Effelsberg- 
Bonn Hl Su rvey" (EBHIS; |Kerp et aLl|20TT| |Winkel 
et al.||2010|), the "Arecibo Legacy Fast ALFA Sur- 



2009) 



and the Austral ian SKA Pathfinder (ASKAP; [John- 
ston et al.||2008|), the data is three dimensional with 
J II r' 

two angular and one spectral dimension, which is re- 
ferred to as a data cube. Since spatially unresolved 
sources can still be resolved spectrally, sources gener- 
ally do not have the same size among the three differ- 
ent axes of the data cube. This leads to an anisotropy, 
which makes isotropic denoising schemes inefficient, 
since the wavelet decomposition does not match the 
natural shape of the sources very well. Nonetheless, 
the sources can be considered partly isotropic in each 
individual spectral slice (channel map) and are also 
approximately isotropic along each line of sight. It is 
therefore beneficial to split the wavelet decomposition 
up into a two dimensional angular and a one dimen- 
sional spectral part. 

The theoretical foundation for this 2D-1D trans- 
formation is laid out by |Starck et al. ( 2009 ) , which 



apply a 2D- ID denoising to data from the Fermi LAT 
( |Atwood et al.| |2009). Fermi LAT data has either two 
angular and one spectral or two angular and one tem- 
poral domain. Even though very different from radio 
astronomical observations in its noise characteristics, 
it is similar to data from imaging spectroscopic surveys 
in terms of the dimensionality. 

To calculate a wavelet representation that accounts 
for this difference in axis type, they first calculate a 2D 
IUWT of each channel map and subsequently apply a 
ID IUWT along each pixel of this wavelet coefficient 
data cube. When applying this decomposition with J± 
angular and J2 spectral scales one arrives at a decom- 
position of the form 

D(x) = cj u j 2 (x) 



(3) 



vey" (ALFALFA; |Giovanelli et al.||2005| ) and future 



Analogous to Equation [T] cj li j 2 (x) is the smooth ver- 
sion at angular scale Ji and spectral scale J2. The 
second row contains the coefficients that arise from 
either spectral decomposition of the smooth angular 
scale or angular decomposition of the smooth spectral 
scale. The last sum of coefficients Wj 1 j 2 contains the 
detail of the data at angular scale ji and spectral scale 

32- 

To implement an iterative denoising scheme as de- 
scribed above, one again has to construct a five dimen- 
sional (two angular dimensions, one spectral dimen- 
sions, and two scale indices), multi-resolution support 
M(x, ji, 32). Once all significant coefficients are de- 
tected, the iterative reconstruction can be applied as 
described in the previous section. 

3 Implementation 

3.1 Scale selection 

In general, the denoising of data is performed by de- 
composing the image with the maximum amount of 
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scales, i.e. Ji = [\og 2 Ni\ . There are however certain 
advantages in only using a subset of decomposition 
scales for both the angular and spectral regime. This 
is especially true when the denoised image only serves 
as a mask to find sources in the original data and com- 
plete flux reconstruction is not of importance. 

The signature of a galaxy in neutral hydrogen sur- 
veys is typically small compared to the dimensions of 
the data cube. It is therefore unlikely that one will 
miss any sources when leaving out all decomposition 
scales that belong to larger spectral or angular scales. 
Especially for single dish observations, the informa- 
tion contained at the very large scales is most likely 
due to baseline errors or radio frequency interference 
(RFI). By neglecting those larger scales, one can sup- 
press those errors in the reconstruction and extract 
sources even from suboptimal data. This property is 
investigated in Section [43] 

For this reason, the reconstruction in our algo- 
rithm is done with a "physical" subset of angular and 
spectral scales that are likely to contain the signal of 
sources. Likewise, the coefficients wj 1 j 2 , Wj x ^j 2 and 
the smooth data cj 1; j 2 , which all contain the informa- 
tion at the largest scales, are not taken into account, 
and are not part of the reconstructed data. 

What scales are to be considered physical depends 
on the type of source one is looking for. In this paper 
we mainly focus on extragalactic objects, namely Hi 
galaxies, which are typically not larger than a few arc 
minutes. What scales then contain the desired objects 
depends on their typical angular and spectral size and 
the respective sampling of the dimensions on the voxels 
of the data cube. 

Additionally, we only reconstruct the data from the 
positive wavelet coefficients. This approach is different 
from the usual iterative approach, where all significant 
wavelet coefficients are used and the negative values of 
the reconstruction are set to zero at each iteration. We 
noticed, that by choosing only the positive coefficients 
and using mathematical morphology (see next section) 
we suppress the artifacts that arise durin g partial re- 
construction from wavelet coefficients (Starck et al 



2007). If unsuppressed, these artifacts make the us- 



age of the reconstruction as a mask for source finding 
difficult, as they can also lead to merging of sources. 
On the other hand, this posit ivity constraint makes 
searching for negative features, e.g. absorption lines, 
impossible. 

3.2 Mathematical morphology 

Another advantage of storing the information of the 
significant coefficients in the multi-resolution support 
is, that one can perform mathematical morphology on 
it ( |Serra|[l982| ). Generally, data cubes are created in 
a way, that the sampling of the telescope beam fulfills 
the Nyquist sampling theorem ( |Nyquist||1928| , mean- 
ing that it is sampled on at least two pixels in every 
direction (including the diagonal). This means that 
real sources are larger than a single pixel in the an- 
gular dimension and are most likely also sampled in 
more than one spectral channel. Furthermore, it is well 
known that significant structures propagate through 



the different scales of the IUWT. Sources will there- 
fore be present in multiple adjacent coefficients in the 
three dimensions of the data as well as adjacent spatial 
and spectral scales and form connected regions in the 
five-dimensional multi-resolution support. 

To further suppress the noise in the reconstruction 
we perform a five-dimensional morphological opening 
of the multi-resolution support. Morphological open- 
ing consists of the successive application of an erosion 
followed by a dilation. The former removes elements 
from the multi-resolution support if one of its neigh- 
bors (in all five dimensions) is and the latter does the 
opposite, i.e. adding elements to the multi-resolution 
support if one of its direct neighbors is 1 . This amounts 
to a successive shrinking and growing of objects in the 
five-dimensional multi-resolution support. 

Objects spanning all five dimensions of the multi- 
resolution support are not affected by this operation. 
However, objects that do not span all five dimensions, 
and are therefore likely to be noise artifacts, are re- 
moved. This allows us to use much lower thresholds 
of typically 1.5crj during reconstruction. For the pur- 
poses of source finding this is both an increase in sen- 
sitivity as well as reliability. 



3.3 Memory layout and processing 

The described 2D- ID denoising has been implemented 
in C++ using the a trous algorithm in both the two 
dimensional angular, as well as the one dimensional 
spectral transformation. The complete storage of all 
wavelet coefficients would take J\ x J2 the amount of 
memory of the original data, which can easily exceed 
the available computing resources for the typical size 
of a data cube of several hundreds of MByte. Here, 
we deal with this major issue by performing the re- 
construction on-the-ffy. Such a serialized method only 
needs to store the angular smoothed version Cj, the 
angular wavelet coefficients Wj 1 , and the reconstruc- 
tion D. This way, the memory consumption of the al- 
gorithm is greatly reduced being now independent on 
the number of angular and spectral scales analyzed. 

Another memory concern is the size of the multi- 
resolution support, that has to store N1XN2XN3XJ1X 
Ji boolean values, where Ni is the size of the data cube 
in pixels along the ith axis. For this purpose, the Stan- 
dard Template Library (STL) for C++ implements a 
specialized container, that is able to store boolean val- 
ues as individual bits rather than bytes, which makes 
the memory footprint of the multi-resolution support 
acceptable. 

The splitting of the different wavelet transforma- 
tions makes this denoising scheme a prime candidate 
for parallel computing. Using the OpenMF^] library, 
the angular wavelet decomposition of each spectral 
channel as well as the spectral wavelet decomposition 
of each line-of-sight was implemented to be computed 
in parallel. 
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Voxels 



Figure 1: Normalized total flux (top) and nor- 
malized integrated signal-to-noise ratio (bottom) 
as a function of the number of voxels added up, 
ordered by flux in descending order, for a given 
source model. 
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Pixel (Velocity) 



Figure 2: Example of a wavelet reconstruction by 
the described algorithm. From top to bottom the 
panels show: source model only, source model with 
added noise, reconstruction by the algorithm. 



4 Simulations 

To examine the various aspects in the sections below, 
we created 1000 simple Hi galaxy templates using the 
GlPS^Jtask galmod. The galaxies were simple disks 
with random inclination and maximum rotational ve- 
locity while keeping the overall brightness profile and 
rotation curve fixed. 

Noise was generated ac cording to the specifications 
of the WALLAB^Jsurvey ( |Koribalski fe Staveley-Smithj 



2009| also see Koribalski 2011, this PAS A issue). The 
models were convolved with a Gaussian beam of ap- 
proximately 30" and inserted into data cubes with an 
rms noise of 1.8mJy/beam. The exact specifications 
are however not important for the simulations since all 
tested quantities are given in terms of signal-to-noise 
ratios and the algorithm only operates on the pixel 
grid of the data. A difference in beam sizes should 
therefore yield the same results if the beam is sampled 
on the same number of pixels. 

The algorithm was run on multiple data cubes of 
300 by 300 pixels and 600 channels size that each con- 
tained 20 random galaxies at random positions. 

4.1 Source scaling 

Since the proposed algorithm is sensitive to the com- 
plete source signal in the data as opposed to e.g. the 
peak flux, we scaled each of the 1000 galaxies to a fixed 
set of integrated signal-to- noise ratios. Since the inte- 
grated signal-to-noise ratio (ISNR) is dependent on the 
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volume over which it is calculated, we first determined 
the optimal volume for each of our models. This was 
done by starting with the brightest voxel of the model 
and successively adding the next fainter one. This way, 
both the total flux and ISNR will increase as a function 
of the number of voxels added. At a certain point, the 
flux in the added voxels becomes very low, since one 
adds the faint "outskirts" of the model. At this point 
the ISNR will go down since one adds more noise than 
source flux. This behavior can be seen in Figure [I] 
The flux at this optimal ISNR is then scaled to yield 
the desired ISNRs. 



4.2 Example reconstruction 

Figure [2] shows the typical result of a reconstruction 
by the described algorithm. The top panel shows one 
of our scaled templates. In the middle panel, the sim- 
ulated noise was added. The bottom panel shows the 
same data cube after the application of the described 
denoising algorithm. In general, the reconstruction 
does not restore the full flux of the inserted model 
and also has a changed appearance as compared to the 
model. This is especially true for low signal-to-noise 
sources, since the reconstruction becomes limited by 
noise. For more pronounced sources there is however a 
good correlation between model flux and reconstructed 
flux. This is shown Figure [3] we plot recovered flux of 
the reconstruction (left panel) as well as the flux re- 
covered from the data when using the reconstruction 
as a mask (right panel) as a function of ISNR. Espe- 
cially for ISNR 32 and 16, the recovered flux matches 
the model quite well. It is also interesting to note, that 
the flux from the reconstruction seems to be closer to 
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Figure 3: Recovered flux as a function of ISNR. 
The crosses show the mean of 1000 galaxy models 
and the error bars indicate the standard deviation 
in each bin. The left panel shows the recovered flux 
as obtained from the wavelet reconstruction. The 
right panel shows the recovered flux when applying 
the wavelet reconstruction as a mask for data and 
calculating the flux from the masked data. 



the true flux than the flux calculated from the data. 
In any case, this evaluation also shows, that the masks 
obtained from the reconstruction should not be used 
as the final masks without further treatment. 

4.3 Robustness 

Since real data does not usually contain ideal noise and 
sources, we evaluated the robustness of the proposed 
algorithm against two common types of data defects: 
baseline ripple and RFI. 

To simulate these effects, we added a sine wave to 
one simulated data cube with a varying phase along 
one angular axis. To simulate the presence of RFI 
we inserted 30 single-channel spikes in the data and 
reran the wavelet reconstruction. The result can be 
seen in Figure [4] Clearly, the wavelet reconstruction 
is not affected by the rather severe presence of RFI 
and baseline ripple. This is because both the base- 
line ripple and RFI are present in scales different from 
the scales of the sources. By carefully selecting which 
scales to reconstruct the data from, we can exclude 
many of such defects. 

4.4 Completeness and reliability 

The two main measures of goodness of a source finder 
are its completeness as a function of source signal and 
the corresponding reliability. The completeness is ex- 
pressed as the percentage of sources that have been 
positively identified by the source finder. The reliabil- 
ity is calculated as the number of true sources divided 
by the total number of objects found by the source 
finder and gives a measure of the probability that a 
given object is indeed a source or a false positive. 

To test the performance of the algorithm as a source 
finder, we set up a simple source finding pipeline by us- 
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Figure 4: Same as Figure [2] but for corrupted 
data. In addition to the simulated noise, the mid- 
dle panel shows a sinusoidal varying baseline and 
added RFI spikes. 



ing several functions from the SciP}|^ package ndimage. 
After the wavelet reconstruction, the ndimage func- 
tions label and find_objects are used to generate 
the objects. For this purpose label searches the data 
cube for connected objects, i.e. regions where the flux 
is greater than zero, and marks each region with a 
unique number, f ind_objects then generates a list of 
slices that each fully contain one of the labeled objects. 
Those slices are then used to calculate various param- 
eters like the total flux of the reconstructed object Fr, 
the total flux in the original data Fd when applying 
the reconstruction as a mask and various shape pa- 
rameters like the size in channels. To check whether 
a given object is a true detection, we use a noise free 
version of the same data set and check for intersections 
with the noise free sources above 20% of the peak flux 
of a galaxy. 

4.4.1 False positives 

To achieve a reasonably high completeness even for 
very faint sources, one has to use very low thresholds 
which will lead to an increasing number of false pos- 
itives. After the contrast enhancement, the identifi- 
cation of false positives is a key task of every source 
finder. 

Since the source of the false positives, i.e. noise 
peaks, is greatly suppressed by the algorithm, they 
mostly stem from the larger wavelet coefficients where 
the noise peaks are spread out over a sufficiently large 
volume to not be removed by the morphological open- 
ing. This leads to a very low reconstructed total flux 
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Figure 5: Distribution of the detection parameters 
Fr, the flux in the reconstruction, Fd, the flux 
in the detection as measured on the original data 
and the number of voxels a given object occupies. 
The green points indicate true detections, the red 
points false detections. The dashed lines indicate 
lOmJykms -1 for Fr and Fd, respectively. 
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Fr and they are therefore easily separated from the 
real sources. 

Figure [5] shows the correlation for three param- 
eters fr om th e simulation with resolved sources (see 
Section |4.4.3| ) , all directly measured from either the 
reconstructed or the original data. It is evident that 
all false positives cluster in one region of the respec- 
tive plots and that they exhibit very low Fr and Fd- 
Therefore, by applying a simple cut in the parameter 
space of the detections, the number of false positives 
can be greatly reduced without sacrificing much of the 
completeness. By only taking sources that have both 
Fr and Fd larger than 10 mJy km s _1 , we exclude 96% 
of all false positives but only 5% true positives. The 
area in which sources fulfill this condition is indicated 
by the dashed lines in Figure [5] 

4.4.2 Point sources 

We tested the completeness as a function of ISNR for 
both extended sources as well as point sources. To 
obtain realistic line profiles for the point sources, the 
extended models were summed up in each channel and 
convolved the resulting spectrum with the beam. The 
resulting point source model was then scaled to the 
desired ISNR. 

The results of the run are summarized in Figure 
[6] Starting from ISNR 0.5, we increased the ISNR by 
a factor of two from bin to bin. Since the drop in 
completeness between ISNR 8 and 4 is rather sharp, 
we ran additional simulations in between those values. 

The lower panel in Figure [6] shows, that the source 
finder is indeed sensitive to the extended signal of the 
sources as we detect sources with a larger line width 
but lower peak signal-to-noise than the sources we do 
not detect for smaller line widths. 

The reliability for these results is close to 100%. 
We achieved this by applying the cut discussed in the 
previous section. Note that Figure [5] was made from 



Figure 6: Results from the simulation with spa- 
tially unresolved sources. The top panel shows the 
completeness as a function of the ISNR as probed 
in our simulations. The lower panel shows the 
completeness as a function of (logarithmic) peak 
signal-to- noise ratio and line width of the source. 
The white areas in the lower plot have not been 
tested. The reliability for this plot is 99%. 



the run with resolved sources. 
4.4.3 Extended sources 

The second run was made with the extended galax- 
ies which are clearly resolved by the simulated obser- 
vations. Figure [7] shows the results of this run in a 
similar fashion to Figure [6] We again cut at Fr and 
Fd < lOmJykms -1 to reach a reliability of 97%. 

The boundary from 100% completeness to 0% is 
substantially smoother than in the case of the point 
sources. This behavior comes from the fact, that ex- 
tended sources can be extended in the angular domain 
while at the same time being very narrow in the spec- 
tral domain, e.g. a galaxy seen face-on. This makes 
it substantially easier to detect galaxies with narrow 
line widths as long as they are extended in the angular 
domain. This is also evident from the lower panel in 
Figure |7| where one can see, that narrow line width 
galaxies are detected to a lower peak signal-to-noise 
ratio than large line width galaxies. 

For this simulation, we also encounter a phenomenon 
usually called fragmentation. We calculate it as the 
percentage of sources that have been detected two or 
more times. This can occur when a source with a very 
large line width is split into two detections. Further- 
more, as mentioned in Section |3J"j wavelet denoising 
schemes are generally prone to produce artifacts dur- 
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similar to the data product of such a survey, e.g. other 
spectral line surveys. 
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Figure 7: Same as Figure [6] for spatially extended 
sources. The reliability for this plot is 97% and 
the fragmentation 3%. 



ing the denoising process. Because of the simple way 
we determine whether a source is a true or a false de- 
tection, those artifacts can also cause multiple detec- 
tions of the same source. We are therefore confident, 
that the fragmentation rate will decrease somewhat as 
the object identification process improves. 



5 Summary 



We have shown how 2D- ID wavelet denoising schemes 
can be used for source finding. Even with very sim- 
ple post-processing of the denoised data, we set up an 
efficient source finding pipeline. Especially the robust- 
ness of the algorithm seems promising that it will work 
well on real data, which is certainly the next test to 
be passed. 

Even though the splitting of the wavelet transfor- 
mation in a 2D and ID part avoids some of the difficul- 
ties that arise with anisotropic sources, it is far from 
perfect. A better denoising would be the usage of a full 
3D curvelet transformation ( |Candes et al.||2007 Ying 



et al. 2005). This transformation is however compu- 
tationally much more difficult and demanding on the 
available hardware. We therefore think that our ap- 
proach is a good compromise between sensitivity and 
computational complexity. But with more powerful 
hardware or more optimized algorithms, denoising by 
usage of the curvelet transform might become feasible, 
even for the large data sets we expect from the future 
radio telescopes. 

Furthermore we like to stress that even though this 
algorithm was developed with Hi surveys in mind, it 
will in principle work for every kind of data that is 
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